Search CORE

19 research outputs found

Machine Learning for Ad Publishers in Real Time Bidding

Author: Refaei Afshar Reza
Publication venue: Eindhoven University of Technology
Publication date: 17/03/2022
Field of study

Automated Reinforcement Learning:An Overview

Author: Kaymak Uzay
Refaei Afshar Reza
Vanschoren Joaquin
Zhang Yingqian
Publication venue
Publication date: 13/01/2022
Field of study

Reinforcement Learning and recently Deep Reinforcement Learning are popular methods for solving sequential decision making problems modeled as Markov Decision Processes. RL modeling of a problem and selecting algorithms and hyper-parameters require careful considerations as different configurations may entail completely different performances. These considerations are mainly the task of RL experts; however, RL is progressively becoming popular in other fields where the researchers and system designers are not RL experts. Besides, many modeling decisions, such as defining state and action space, size of batches and frequency of batch updating, and number of timesteps are typically made manually. For these reasons, automating different components of RL framework is of great importance and it has attracted much attention in recent years. Automated RL provides a framework in which different components of RL including MDP modeling, algorithm selection and hyper-parameter optimization are modeled and defined automatically. In this article, we explore the literature and present recent work that can be used in automated RL. Moreover, we discuss the challenges, open questions and research directions in AutoRL

arXiv.org e-Print Archive

Pure OAI Repository

An Automated Deep Reinforcement Learning Pipeline for Dynamic Pricing

Author: Kaymak Uzay
Refaei Afshar Reza
Rhuggenaath Jason
Zhang Yingqian
Publication venue
Publication date: 01/06/2023
Field of study

A dynamic pricing problem is difficult due to the highly dynamic environment and unknown demand distributions. In this article, we propose a deep reinforcement learning (DRL) framework, which is a pipeline that automatically defines the DRL components for solving a dynamic pricing problem. The automated DRL pipeline is necessary because the DRL framework can be designed in numerous ways, and manually finding optimal configurations is tedious. The levels of automation make nonexperts capable of using DRL for dynamic pricing. Our DRL pipeline contains three steps of DRL design, including Markov decision process modeling, algorithm selection, and hyperparameter optimization. It starts with transforming available information to state representation and defining reward function using a reward shaping approach. Then, the hyperparameters are tuned using a novel hyperparameter optimization method that integrates Bayesian optimization and the selection operator of the genetic algorithm. We employ our DRL pipeline on reserve price optimization problems in online advertising as a case study. We show that using the DRL configuration obtained by our DRL pipeline, a pricing policy is obtained whose revenue is significantly higher than the benchmark methods. The evaluation is performed by developing a simulation for the real-time bidding environment that makes exploration possible for the reinforcement learning agent.</p

Pure OAI Repository

Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem

Author: Beeks Martijn
de Looijer Stijn
Dijkman Remco
Refaei Afshar Reza
van Dorst Claudy
Zhang Yingqian
Publication venue: Association for the Advancement of Artificial Intelligence (AAAI)
Publication date: 13/06/2022
Field of study

On-time delivery and low service costs are two important performance metrics in warehousing operations. This paper proposes a Deep Reinforcement Learning (DRL) based approach to solve the online Order Batching and Sequence Problem (OBSP) to optimize these two objectives. To learn how to balance the trade-off between two objectives, we introduce a Bayesian optimization framework to shape the reward function of the DRL agent, such that the influences of learning to these objectives are adjusted to different environments. We compare our approach with several heuristics using problem instances of real-world size where thousands of orders arrive dynamically per hour. We show the Proximal Policy Optimization (PPO) algorithm with Bayesian optimization outperforms the heuristics in all tested scenarios on both objectives. In addition, it finds different weights for the components in the reward function in different scenarios, indicating its capability of learning how to set the importance of two objectives under different environments. We also provide policy analysis on the learned DRL agent, where a decision tree is used to infer decision rules to enable the interpretability of the DRL approach

Pure OAI Repository

Dynamic Ad Network Ordering Method Using Reinforcement Learning

Author: Kaymak Uzay
Refaei Afshar Reza
Zhang Yingqian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/04/2022
Field of study

Real time bidding is one of the most popular ways of selling impressions in online advertising, where online ad publishers allocate some blocks in their websites to sell in online auctions. In real time bidding, ad networks connect publishers and advertisers. There are many available ad networks for publishers to choose from. A possible approach for selecting ad networks and sending ad requests is called Waterfall Strategy, in which ad networks are selected sequentially. The ordering of the ad networks is very important for publishers, and finding the ordering that will provide maximum revenue is a hard problem due to the highly dynamic environment. In this paper, we propose a dynamic ad network ordering method to find the best ordering of ad networks for publishers that opt for Waterfall Strategy to select ad networks. This method consists of two steps. The first step is a prediction model that is trained on real time bidding historical data and provides an estimation of revenue for each impression. These estimations are used as initial values for the Q-table in the second step. The second step is based on Reinforcement Learning and improves the output of the prediction model. By calculating the revenue of our method and comparing that with the revenue of a fixed and predefined ordering method, we show that our proposed dynamic ad network ordering method increases publishers’ revenue

Pure OAI Repository

A Reward Shaping Approach for Reserve Price Optimization using Deep Reinforcement Learning

Author: Kaymak Uzay
Refaei Afshar Reza
Rhuggenaath Jason
Zhang Yingqian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/09/2021
Field of study

Real Time Bidding is the process of selling and buying online advertisements in real time auctions. Real time auctions are performed in header bidding partners or ad exchanges to sell publishers' ad placements. Ad exchanges run second price auctions and a reserve price should be set for each ad placement or impression. This reserve price is normally determined by the bids of header bidding partners. However, ad exchange may outbid higher reserve prices and optimizing this value largely affects the revenue. In this paper, we propose a deep reinforcement learning approach for adjusting the reserve price of individual impressions using contextual information. Normally, ad exchanges do not return any information about the auction except the sold-unsold status. This binary feedback is not suitable for maximizing the revenue because it contains no explicit information about the revenue. In order to enrich the reward function, we develop a novel reward shaping approach to provide informative reward signal for the reinforcement learning agent. Based on this approach, different intervals of reserve price get different weights and the reward value of each interval is learned through a search procedure. Using a simulator, we test our method on a set of impressions. Results show superior performance of our proposed method in terms of revenue compared with the baselines

Pure OAI Repository

A State Aggregation Approach for Solving Knapsack Problem with Deep Reinforcement Learning

Author: Firat Murat
Kaymak Uzay
Refaei Afshar Reza
Zhang Yingqian
Publication venue
Publication date: 01/01/2020
Field of study

This paper proposes a Deep Reinforcement Learning (DRL) approach for solving knapsack problem. The proposed method consists of a state aggregation step based on tabular reinforcement learning to extract features and construct states. The state aggregation policy is applied to each problem instance of the knapsack problem, which is used with Advantage Actor Critic (A2C) algorithm to train a policy through which the items are sequentially selected at each time step. The method is a constructive solution approach and the process of selecting items is repeated until the final solution is obtained. The experiments show that our approach provides close to optimal solutions for all tested instances, outperforms the greedy algorithm, and is able to handle larger instances and more flexible than an existing DRL approach. In addition, the results demonstrate that the proposed model with the state aggregation strategy not only gives better solutions but also learns in less timesteps, than the one without state aggregation

Analisis pengaruh guncangan makroekonomi terhadap integrasi pasar modal di asean 5

Author: Firat Murat
Kaymak Uzay
Refaei Afshar Reza
Zhang Yingqian
Publication venue
Publication date: 01/01/2017
Field of study

Globalization makes investors have many options to invest in any capital market. Implementation of the ASEAN Economic Community (AEC) is an attempt to increase interaction in the economic field, which will increase cross border transactions between ASEAN countries. Increasing in cross-border transactions will make the flow of capital going in and out of a country that could lead to the integration of capital markets. Stock is one profitable investment instruments in the capital markets. One of the indicators used to see the movement of the stock is the stock price index. Composite stock price index is a picture of a country's economy can be affected by macroeconomic variables. This study was conducted to see the impact of macroeconomic shock to capital markets integration in ASEAN 5. The objective of this study were (1) analyze the causality between stock indixes in ASEAN 5 capital market, (2) analyze the impact of macroeconomic variables shocks to the integration of stock indexes in ASEAN 5 capital market, (3) analyze the most influence variables in the integration of stock indexes in ASEAN 5 capital market. Data used in this research are secondary data. Stock price indexes used are index used in the five ASEAN countries, namely Indonesia (IHSG), Malaysia (KLCI), Singapore (STI), Thailand (SETI) and the Philippines (PSEI). Macroeconomic variables used are the macroeconomic variables in Indonesia, namely industrial production (IPI), inflation (INFI), interest rate (SBI) and exchange rate (KURSI). The method used is the analysis of VAR / VECM using impulse response function (IRF) and forecast error variance decomposition (FEVD) and granger causality of the study period 2001-2016. These results indicate there is a causal relationship between ASEAN index 5, whether it is two-way and one-way. IHSG and STI have a two-way causality, and both indexes have an one way causality to PSEI, KLCI and SETI. These results show an index of IHSG and STI have a strong position and influence among ASEAN 5 capital markets. Variable inflation (INFI) have a negative correlation to all ASEAN 5 indexes, this is because INFI response positively to shock itself, that cause the response given by the ASEAN 5 indexes are negative. Exchange rate variable Indonesia (KURSI) have a negative correlation to ASEAN 5 indexes, this is because there is a decrease KURSI response given by itself, that cause the index ASEAN 5 respond positively to shocks given by KURSI. The variable interest rate (SBI) have a negative correlation to ASEAN 5 indexes, this is because SBI respond positively to shocks given by itself, that cause the response given by ASEAN 5 indexes against SBI is negative. Variable industrial production (IPI) have a positive relationship to all ASEAN 5 indexes, this is because there is a reduction in IPI response given by itself, that cause the ASEAN 5 indexes respond negatively to shocks given by IPI. Shocks of macroeconomic variables contribute higher than stock indexes itself. The movement of the IHSG and KLCI is mainly due to inflation in Indonesia, while the movement of PSEI, SETI and STI mainly due to the interest rate in Indonesia. Inflation in Indonesia (INFI) contributed the highest to the movement of stock indexes in Malaysia (KLCI). Industrial production in Indonesia (IPI) contributed the highest to the movement of stock indexes in Indonesia (IHSG). The exchange rate in Indonesia (KURSI) contributed the highest to the movement of stock indexes in Indonesia (IHSG). Interest rate in Indonesia (SBI) contributed most to all stock indexes movement in the ASEAN 5 capital market

Crossref

Repository TU/e

Pure OAI Repository

State Aggregation and Deep Reinforcement Learning for Knapsack Problem

Author: Firat Murat
Kaymak Uzay
Yingqian Zhang Reza Refaei Afshar
Zhang Yingqian
Publication venue: Universiteit Leiden
Publication date: 01/01/2020
Field of study

Pure OAI Repository